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Metadynamics is a powerful computational tool to obtain the free energy landscape of complex systems. The Monte Carlo algorithm 
lias proven useful to calculate thermodynamic quantities associated with simplified models of proteins, and thus to gain an ever- 
increasing understanding on the general principles underlying the mechanism of protein folding. We show that it is possible to 
couple metadynamics and Monte Carlo algorithms to obtain the free energy of model proteins in a way which is computationally 
very economical. 
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1. Introduction 

Metadynamics is an algorithm which coupled to molecu- 
lar dynamics provides an efficient tool to obtain the energy 
landscape of systems displaying large energy barriers, and 
thus whose sampling by standard tools is, at best, prob- 
lematic. It is based on the knowledge of few slow collec- 
tive variables Si of the system and on the use of a non~ 
]VIarkovian potential U{si) that disfavors the exploration 
of regions of th e phase space already visited by the system 
ijLaio and Parri ncUo (2002)). This algorithm has been suc- 
cesfully used to o btain the free energ y of molecular systems 
at atomic detail ( Babin et al.l ( 20061 )). 

In the case of simplified protein models, where the 
atomic structure of each amino acid is coarse-grained, 
it is common to sample the conformational space with 
the help of Monte Carlo algorithms. Such an approach 
is computationally more economic and more simple to 
implement than the corresponding molecular dynamics 
algori t hm (see, e.g. ls^hirnadji_ctLa lj (l200ll ). Kussell ct al. 



Of course, other modifications of the straight Monte 
Carlo sampling have been developed during the last tens of 
years, including simulated tempering, multicanonical sam- 
pling, parallel tempering, etc. All of them are aimed at pre- 
venting the system to get trapped in free energy minima. 
In the following we show that Monte Carlo metadynamics 
is efficient, accurate and particularly easy to implement. 

We apply a scheme to the calculation of the free en- 
ergy, as a function of the RMSD, of a small domain 
protein, namely Src - S H3. It is a wid e ly studied domain 



Grantcharova et al.l (|l998l)lYi et al.l (|l998DlRiddle et al 



WM)) of the Proto -oncogene tyrosine-protein kinase Src, 



( 2002h . IShimada and ShakhnovichI (j2002r )V It is then natu 



ral to try to extend metadynamics so as to make it possible 
to couple it to a Monte Carlo algorithm. 
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a 536 residue protein that plays a multitude of roles in cell 
signalling. Src is involved in the control of many functions, 
including cell adhesion, growth, movement and differenti- 
ation. SH3 is a domain built out of 60 residues, displaying 
mainly /^-strands (see Fig. [1]). From calorimetry and flu- 
orescence experiments, it is known to fold according to 
a two-state mechanism, that is, populating at biologi- 
cal temperature mainly two states (the native and the 
unfolded state) ( Grantcharova and Baker ( 1997? )). Conse- 
quently, we expect the free energy landscape to display 
two minima separated by a barrier. 

In Section O we present the protein model used in the 
simulations along with a working description of the algo- 
rithm. We devise a formal proof of the correctness of the 
method in Section [3] and then test it in the specific case of 
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Fig. 1. Src protein (Proto-oncogene tyrosine-protein kinase Src). The 
upper right part (dark, red onUne) is the SH3 domain. 

Src-SH3 in Section [H 



2. Method 

The model employed in the simulations describes the 
protein as a chain of beads centered on the Cq of the protein 
backbone (see Fig. [J]). The allowed moves are the flip-move 
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Fig. 2. Schematic protein model used in the simulations. The whole 
protein is shown, and in particular the linear Ca chain (dark, red 
online). Also visible is the sidechain (light blue online), which is 
although not used in the model. 

and tail flip. The interactions are described by a Go-model 
(|GoI (^1975)), where the only contacts participating in the 
potential energy calculation are the native c ontacts. 



Each Monte Ca. r lo step , we apply a Metropolis algorithm 
( Metropolis et al.l ( 1953f )) where the transition probability 



is given by 
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that is, the probability with which the next Monte Carlo 
move is accepted is calculated on the variation of the en- 
ergy of the system, plus the variation of the metadynamics 
potential. 

3. Theory 

During each fragment of trajectory after the update of 
the non-Markovian potential at each time T, the collective 
variable s explores a region A{T). If one makes the critical 
assumption that the dynamics has been able to visit this 
region so extensively that ergodicity holds, then the prob- 
ability distribution of the collective variable is 

-/3(F(s) + [/(s,T))] 



P{s,T) 



exp[ 



(2) 



exp[-/3(i^(s') + C/(s',T))] 

After the end of this sampling, the non-Markovian poten- 
tial is updated, and the new potential reads 



U{s, T + t)^ U{s, T) + WtP{s, T) 



(3) 



where W is the heigth of the energy added to the non- 
Markovian potential. Further assuming that W is small, 
that is that the new term does not perturb in an impor- 
tant way the shape of the potential U , then the previous 
equation can be rewritten as 

dU{s,T) 



dT 



W- 



exp[-/3(i^(s) + [/(s,T))] 



(4) 



Once the free energy landscape is completely filled by 
the non-Markovian energy, then the growth of this non- 
Markovian energy will be independent on s, that is 
dU/dT = W/A, or equivalently 



exp[-/3(F(s) + [/(s,oo))] = 



\Jjs' exp[-f3{F{s') + U{s',^))] , 



(5) 



where A is the whole interval spanned by the collective 
variable. Integrating by saddle-point evaluation, leads to 

F{s) = 



t/(s, oo) 



rMog 



27r 



P\F"{so) + U"{so)\ 



Eve ry r steps of the Monte Carlo sampling (IMetropolis and Ulam 
:he non-Markovian energy contribution is up- 
dated by adding a Gaussian hill with height W and spread 
Ss, centered around the current values of the collective 
variables. 



+ F{so) + U{so) , 



(6) 



where sq is defined by U'{so) = — F'(so). The last equa- 
tion states that the free energy of the system is, except 



for an additive constant, equal to the opposite of the non- 
Markovian potential. A nice property of this algorithm is 
that the obtained free energy depends logarithmically on 
any additive error in the determination of U{s, T + t) (i.e., 
if one adds e(s)r to Eq. (jl]), one obtains an additive term 
loge(s) in F{s)). 

4. Results 

In order to obtain a reference free energy landscape as a 
function of the RMSD for comparison to the metadynamics 
reconstructed landscapes, we first carried out a fairly long 
classical Monte Carlo simulation (90 billions of Monte Carlo 
Steps (MCS)). The free energy calculated at temperature 
9 = 0.625 (slightly below the folding temperature, defined 
as the temperature at which the volume of the native basin 
is equal to that of the denatured basin) is displayed in Fig. 
[3l After 80 billion steps the root mean square difference 
between the landscape at time T and at time T — AT, 
where AT = 10000 MCS, was constantly below 10"^ A, 
indicating that the free energy is likely to have reached its 
equilibrium shape. 
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Fig. 3. Reference energy landscape as a function of the RMSD, 
obtained from a 90 billions of step long Monte Carlo simulation. 
Shown in the insets are a typical folded configuration (F) , a compact 
unfolded one (U) and an elongated unfolded one (right). 

The landscape presents a fairly broad barrier between 
the folded and the unfolded states (marked with B in Fig. 
[3]) . Also shown in Fig. [3] are a typical folded configuration 
(the shown configuration has a RMSD of 3.11 A, less than 
the distance between two consecutives Cq in the protein 
sequence), a compact unfolded one, and an elongated un- 
folded configuration. 

To be able to quantify the degree of convergence of the 
free energy landscapes reconstructed by Monte Carlo meta- 
dynamics, we calculate the standard deviation between the 
reconstructed landscape after T steps and the reference one 



min J fl. 



Uraetaix) - fMc{x)) dx 



where fmeta{x) and fMci^) are the reconstructed and ref- 
erence landscapes respectively, while Rmin and Rmax define 
the range of the RMSD over which a is calculated. They 
are chosen so as to englobe in the calculation the most sig- 
nificant fraction of the landscape, the corresponding values 
being 2 and 16 A, respectively. The reason why the edges 
of the landscape are not included in the calculation is that 
they are both noisy, as they are seldom visited by the sys- 
tem, aside from corresponding to high values of the free 
energy k9, where k is the Boltzmann constant and 9 
is the temperature) , and consequenlty not interesting from 
the thermodynamic point of view. 

The two collective variables used are the RMSD and the 
radius of gyration Rg . We then proceed to integrate out the 
radius of gyration 



F{RMSD) = -6* log / dRg exp 



-F{RMSD, Rg 



(8) 

where 9 is the simulation temperature, so as to have a sim- 
pler, one-dimensional, visualization of the free energy land- 
scape. The simulations are carried out at different values 
of the height W of the Gaussian terms added to the non- 
Markovian potential and of their deposition time r, with 
each Gaussian having a fixed standard deviation Ss = 0.25. 

In Fig. [4] we show the reconstructed free-energy land- 
scape as a function of the RMSD at different values of the 
number of MCS elapsed in the simulation (with W — 0.01 
and T — 200000). At the beginning (see the inset) the pro- 




(7) 
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Fig. 4. Expected and reconstructed energy landscapes for a typical 
simulation. Shown in continuous line is the expected energy land- 
scape. It can be appreciated how, after having reached the minimum 
value for a, the reconstructed landscapes (here shown one every 2 bil- 
lions of steps, starting from 6 billions) have small oscillations around 
the expected one. (Inset) Expected and reconstructed energy land- 
scapes, ranging from the reconstructed landscape after about 100 
millions of steps, to the one at the minimum tr, after 3 billions of 
stops. 

tein explores mainly the regions around 3 and 11 A, pro- 
ducing the two minima associated with the native and the 
denatured state. After these have been filled by the non- 
Markovian term, the rest of the landscape is refined and 
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converges to the reference one within 3 x 10^ MCS (to be 
compared with the 8 x 10^*^ MCS needed by the standard 
Monte Carlo simulation). 

The corresponding values of <j are displayed in Fig. [5] as 
a function of the number of MCS. The simulation reaches 
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Fig. 5. o" as a function of time. It can easily be seen how it goes 
down really fast, and then starts to oscillate in a quite small zone. 

a fairly low a (under 0.2 A) in a few billions MCS, and 
then oscillates (with a spread of less than ^ 0.15 A) around 
~ 0.1 A. 

Fig. [6] shows the dependence of a on W ^ at fixed t = 
200000. The value of a plotted here corresponds to the 
averag e of the last 2 x 10*^ MCS of the simulation (cf. Fig. 
O. The plot shows a steep increase of a with respect to 
the height of the hills (note the logarithmic axis scale), 
indicating that only a fine-grained deposition of the non- 
Markovian term is able to drive the system to equilibrium. 
This is consistent with the fact that Eq. ^ is derived by 
Eq. ([3]) as an expansion for small W ^ and consequently fails 
when W is increased. 
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(T as a function of W for a fixed r. 



In Fig. [7] the dependence of a on W It i s sho wn. It was 
shown by Laio and coworkers ( Laio et al. ( 20051 )) that the 
lower is W Jt , the higher the accuracy of standard metady- 
namics is. The data shown indicate that also in Monte Carlo 
metadynamics the accuracy of the reconstructed landscape 



increases when WjT is decreased. In particular, it seems 
that a is related to this ratio by a linear function (the lin- 
ear fit indicates a slope of 5.3 x 10^, with a correlation of 
0.955). 



0.6 h 
0.5 
'0.4 
0.3 
0.2 
0.1, 



2e-07 



4e-07 



6e-07 



8e-07 



Fig. 7. (T as a function of W/r. (dashed, red online) Linear fit, 
with intercept 0.189, slope 5.319e-|-05, standard deviation 0.170 and 
correlation coefficient 0.955. 

Fig. [8] shows how a varies for different values of r, at 
fixed W. As T increases, a becomes smaller, as expected 
from the theorerical discussion carried out in Section [3l In 
fact, the larger is r, the more likely it is for the system to 
having explored a region A(T) exhaustively and thus for 
Eq. ([2]) to be a good approximation of the actual probability 
distribution. 




100 200 300 400 500 600 700 800 
T [thousands of steps] 

Fig. 8. cr as a function of t for two different, fixed, W. In particular, 
W = 0.02, continuous line (red online) and W = 0.04, dashed line 
(blue online). 



5. Conclusion 

The metadynamics strategy has been implemented 
within a Monte Carlo scheme in order to take benifit from 
the positive aspects of both approaches. The algorithm is 
tested with a simplified protein model, and results par- 
ticularly efficient and accurate in reconstructing the free 
energy landscape of the protein. 
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